Model Averaging via Penalized Regression for Tracking Concept Drift

نویسندگان

  • Kyupil Yeon
  • Moon Sup Song
  • Yongdai Kim
  • Hosik Choi
  • Cheolwoo Park
چکیده

A supervised learning algorithm aims to build a prediction model using training examples. This paradigm typically has the assumptions that the underlying distribution and the true input-output dependency does not change. However, these assumptions often fail to hold, especially in data streams. This phenomenon is known as concept drift. We propose a new model combining algorithm for tracking concept drift in data streams. The final predictive ensemble model has a form of a weighted average and ridge regression combiner. The coefficients of the combiner are determined by ridge regression with the constraints such that the coefficients are nonnegative and sum to one. The proposed algorithm is devised via a new measure of concept drift, the angle between the estimated weights from data and the optimal weight vector obtained under no concept drift. It is shown that the ridge tuning parameter plays a crucial role of forcing the proposed algorithm to adapt to concept drift. Our main findings include (i) the proposed algorithm can achieve the optimal weights in the case of no concept drift if the tuning parameter is sufficiently large, and (ii) the angle is monotonically increasing as the tuning parameter decreases. These imply that if the tuning parameter is well-controlled, the algorithm can produce weights which reflect the degree of concept drift measured by the angle. Using various numerical examples, it is shown that the proposed algorithm can track concept drift better than other existing ensemble methods. Supplemental materials computer code and R-package are available online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tracking concept drift using a constrained penalized regression combiner

The objective of this work is to develop a predictive model when data batches are collected in a sequential manner. With streaming data, information is constantly being updated and a major statistical challenge for these types of data is that the underlying distribution and the true input-output dependency might change over time, a phenomenon known as concept drift. The concept drift phenomenon...

متن کامل

Penalized Bregman Divergence Estimation via Coordinate Descent

Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...

متن کامل

Penalized Estimators in Cox Regression Model

The proportional hazard Cox regression models play a key role in analyzing censored survival data. We use penalized methods in high dimensional scenarios to achieve more efficient models. This article reviews the penalized Cox regression for some frequently used penalty functions. Analysis of medical data namely ”mgus2” confirms the penalized Cox regression performs better than the cox regressi...

متن کامل

An improved model averaging scheme for logistic regression

Recently, penalized regression methods have attracted much attention in the statistical literature. In this article, we argue that such methods can be improved for the purposes of prediction by utilizing model averaging ideas. We propose a new algorithm that combines penalized regression with model averaging for improved prediction. We also discuss the issue of model selection versus model aver...

متن کامل

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010